196 research outputs found
Recommended from our members
Uncovering Features in Behaviorally Similar Programs
The detection of similar code can support many so ware engineering tasks such as program understanding and program classification. Many excellent approaches have been proposed to detect programs having similar syntactic features. However, these approaches are unable to identify programs dynamically or statistically close to each other, which we call behaviorally similar programs. We believe the detection of behaviorally similar programs can enhance or even automate the tasks relevant to program classification. In this thesis, we will discuss our current approaches to identify programs having similar behavioral features in multiple perspectives.
We first discuss how to detect programs having similar functionality. While the definition of a program’s functionality is undecidable, we use inputs and outputs (I/Os) of programs as the proxy of their functionality. We then use I/Os of programs as a behavioral feature to detect which programs are functionally similar: two programs are functionally similar if they share similar inputs and outputs. This approach has been studied and developed in the C language to detect functionally equivalent programs having equivalent I/Os. Nevertheless, some natural problems in Object Oriented languages, such as input generation and comparisons between application-specific data types, hinder the development of this approach. We propose a new technique, in-vivo detection, which uses existing and meaningful inputs to drive applications systematically and then applies a novel similarity model considering both inputs and outputs of programs, to detect functionally similar programs. We develop the tool, HitoshiIO, based on our in-vivo detection. In the subjects that we study, HitoshiIO correctly detect 68.4% of functionally similar programs, where its false positive rate is only 16.6%.
In addition to functional I/Os of programs, we attempt to discover programs having similar execution behavior. Again, the execution behavior of a program can be undecidable, so we use instructions executed at run-time as a behavioral feature of a program. We create DyCLINK, which observes program executions and encodes them in dynamic instruction graphs. A vertex in a dynamic instruction graph is an instruction and an edge is a type of dependency between two instructions. The problem to detect which programs have similar executions can then be reduced to a problem of solving inexact graph isomorphism. We propose a link analysis based algorithm, LinkSub, which vectorizes each dynamic instruction graph by the importance of every instruction, to solve this graph isomorphism problem efficiently. In a K Nearest Neighbor (KNN) based program classification experiment, DyCLINK achieves 90 + % precision.
Because HitoshiIO and DyCLINK both rely on dynamic analysis to expose program behavior, they have better capability to locate and search for behaviorally similar programs than traditional static analysis tools. However, they suffer from some common problems of dynamic analysis, such as input generation and run-time overhead. These problems may make our approaches challenging to scale. Thus, we create the system, Macneto, which integrates static analysis with machine topic modeling and deep learning to approximate program behaviors from their binaries without truly executing programs. In our deobfuscation experiments considering two commercial obfuscators that alter lexical information and syntax in programs, Macneto achieves 90 + % precision, where the groundtruth is that the behavior of a program before and after obfuscation should be the same.
In this thesis, we offer a more extensive view of similar programs than the traditional definitions. While the traditional definitions of similar programs mostly use static features, such as syntax and lexical information, we propose to leverage the power of dynamic analysis and machine learning models to trace/collect behavioral features of pro- grams. These behavioral features of programs can then apply to detect behaviorally similar programs. We believe the techniques we invented in this thesis to detect behaviorally similar programs can improve the development of software engineering and security applications, such as code search and deobfuscation
Recommended from our members
Identifying Functionally Similar Code in Complex Codebases
Identifying similar code in software systems can assist many software engineering tasks, including program understanding. While most approaches focus on identifying code that looks alike, some researchers propose to detect instead code that functions alike, which are known as functional clones. However, previous work has raised the technical challenges to detect these functional clones in object oriented languages such as Java. We propose a novel technique, In-Vivo Clone Detection, a language-agnostic technique that detects functional clones in arbitrary programs by observing and mining inputs and outputs. We implemented this technique targeting programs that run on the JVM, creating HitoshiIO (available freely on GitHub), a tool to detect functional code clones. Our experimental results show that it is powerful in detecting these functional clones, finding 185 methods that are functionally similar across a corpus of 118 projects, even when there are only very few inputs available. In a random sample of the detected clones, HitoshiIO achieves 68+% true positive rate, while the false positive rate is only 15%
Recommended from our members
Metamorphic Runtime Checking of Applications without Test Oracles
Challenges arise in testing applications that do not have test oracles, i.e., for which it is impossible or impractical to know what the correct output should be for general input. Metamorphic testing, introduced by Chen et al., has been shown to be a simple yet effective technique in testing these types of applications: test inputs are transformed in such a way that it is possible to predict the expected change to the output, and if the output resulting from this transformation is not as expected, then a fault must exist. Here, we improve upon previous work by presenting a new
technique called Metamorphic Runtime Checking, which automatically conducts metamorphic testing of both the entire application and individual functions during a program's execution. This new approach improves the scope, scale, and sensitivity of metamorphic testing by allowing for the identification of more properties and execution of more tests, and increasing the likelihood of detecting faults not be found by application-level properties alone. We also discuss a technique for automatically discovering functions' metamorphic properties, and present the results of new studies that demonstrate that Metamorphic Runtime Checking advances the state of the art in testing applications without oracles
Recommended from our members
Code Relatives: Detecting Similar Software Behavior
Detecting "similar code" is fundamental to many software engineering tasks. Current tools can help detect code with statically similar syntactic features (code clones). Unfortunately, some code fragments that behave alike without similar syntax may be missed. In this paper, we propose the term "code relatives" to refer to code with dynamically similar execution features. Code relatives can be used for such tasks as implementation-agnostic code search and classification of code with similar behavior for human understanding, which code clone detection cannot achieve. To detect code relatives, we present DyCLINK, which constructs an approximate runtime representation of code using a dynamic instruction graph. With our link analysis based subgraph matching algorithm, DyCLINK detects fine-grained code relatives efficiently. In our experiments, DyCLINK analyzed 290+ million prospective subgraph matches. The results show that DyCLINK detects not only code relatives, but also code clones that the state-of-the-art system is unable to identify. In a code classification problem, DyCLINK achieved 96% precision on average compared with the competitor's 61%
Recommended from our members
Dynamic Inference of Likely Metamorphic Properties to Support Differential Testing
Metamorphic testing is an advanced technique to test programs without a true test oracle such as machine learning applications. Because these programs have no general oracle to identify their correctness, traditional testing techniques such as unit testing may not be helpful for developers to detect potential bugs. This paper presents a novel system, Kabu, which can dynamically infer properties of methods' states in programs that describe the characteristics of a method before and after transforming its input. These Metamorphic Properties (MPs) are pivotal to detecting potential bugs in programs without test oracles, but most previous work relies solely on human effort to identify them and only considers MPs between input parameters and output result (return value) of a program or method. This paper also proposes a testing concept, Metamorphic Differential Testing (MDT). By detecting different sets of MPs between different versions for the same method, Kabu reports potential bugs for human review. We have performed a preliminary evaluation of Kabu by comparing the MPs detected by humans with the MPs detected by Kabu. Our preliminary results are promising: Kabu can find more MPs than human developers, and MDT is effective at detecting function changes in methods
TPMD: a database and resources of microsatellite marker genotyped in Taiwanese populations
Taiwan Polymorphic Marker Database (TPMD) (http://tpmd.nhri.org.tw/) is a marker database designed to provide experimental details and useful marker information allelotyped in Taiwanese populations accompanied by resources and technical supports. The current version deposited more than 372 000 allelotyping data from 1425 frequently used and fluorescent-labeled microsatellite markers with variation types of dinucleotide, trinucleotide and tetranucleotide. TPMD contains text and map displays with searchable and retrievable options for marker names, chromosomal location in various human genome maps and marker heterozygosity in populations of Taiwanese, Japanese and Caucasian. The integration of marker information in map display is useful for the selection of high heterozygosity and commonly used microsatellite markers to refine mapping of diseases locus followed by identification of disease gene by positional candidate cloning. In addition, our results indicated that the number of markers with heterozygosity over 0.7 in Asian populations is lower than that in Caucasian. To increase accuracy and facilitate genetic studies using microsatellite markers, we also list markers with genotyping difficulty due to ambiguity of allele calling and recommend an optimal set of microsatellite markers for genotyping in Taiwanese, and possible extension of genotyping in other Mongoloid populations
SLACC: Simion-based Language Agnostic Code Clones
Successful cross-language clone detection could enable researchers and
developers to create robust language migration tools, facilitate learning
additional programming languages once one is mastered, and promote reuse of
code snippets over a broader codebase. However, identifying cross-language
clones presents special challenges to the clone detection problem. A lack of
common underlying representation between arbitrary languages means detecting
clones requires one of the following solutions: 1) a static analysis framework
replicated across each targeted language with annotations matching language
features across all languages, or 2) a dynamic analysis framework that detects
clones based on runtime behavior.
In this work, we demonstrate the feasibility of the latter solution, a
dynamic analysis approach called SLACC for cross-language clone detection. Like
prior clone detection techniques, we use input/output behavior to match clones,
though we overcome limitations of prior work by amplifying the number of inputs
and covering more data types; and as a result, achieve better clusters than
prior attempts. Since clusters are generated based on input/output behavior,
SLACC supports cross-language clone detection. As an added challenge, we target
a static typed language, Java, and a dynamic typed language, Python. Compared
to HitoshiIO, a recent clone detection tool for Java, SLACC retrieves 6 times
as many clusters and has higher precision (86.7% vs. 30.7%).
This is the first work to perform clone detection for dynamic typed languages
(precision = 87.3%) and the first to perform clone detection across languages
that lack a common underlying representation (precision = 94.1%). It provides a
first step towards the larger goal of scalable language migration tools.Comment: 11 Pages, 3 Figures, Accepted at ICSE 2020 technical trac
Long-term results of intensity-modulated radiotherapy concomitant with chemotherapy for hypopharyngeal carcinoma aimed at laryngeal preservation
<p>Abstract</p> <p>Background</p> <p>The objective of this retrospective study is to investigate laryngeal preservation and long-term treatment results in hypopharyngeal carcinoma treated with intensity-modulated radiotherapy (IMRT) combined with chemotherapy.</p> <p>Methods</p> <p>Twenty-seven patients with hypopharyngeal carcinoma (stage II-IV) were enrolled and underwent concurrent chemoradiotherapy. The chemotherapy regimens were monthly cisplatin and 5-fluorouracil for six patients and weekly cisplatin for 19 patients. All patients were treated with IMRT with simultaneous integrated boost technique. Acute and late toxicities were recorded based on CTCAE 3.0 (Common Terminology Criteria for Adverse Events).</p> <p>Results</p> <p>The median follow-up time for survivors was 53.0 months (range 36-82 months). The initial complete response rate was 85.2%, with a laryngeal preservation rate of 63.0%. The 5-year functional laryngeal, local-regional control, disease-free and overall survival rates were 59.7%, 63.3%, 51.0% and 34.8%, respectively. The most common greater than or equal to grade 3 acute and late effects were dysphagia (63.0%, 17 of 27 patients) and laryngeal stricture (18.5%, 5 of 27 patients), respectively. Patients belonging to the high risk group showed significantly higher risk of tracheostomy compared to the low risk group (p = 0.014).</p> <p>Conclusions</p> <p>After long-term follow-up, our results confirmed that patients with hypopharyngeal carcinoma treated with IMRT concurrent with platinum-based chemotherapy attain high functional laryngeal and local-regional control survival rates. However, the late effect of laryngeal stricture remains a problem, particularly for high risk group patients.</p
Indoor CO2 monitoring in a surgical intensive care unit under visitation restrictions during the COVID-19 pandemic
BackgroundIndoor CO2 concentration is an important metric of indoor air quality (IAQ). The dynamic temporal pattern of CO2 levels in intensive care units (ICUs), where healthcare providers experience high cognitive load and occupant numbers are frequently changing, has not been comprehensively characterized.ObjectiveWe attempted to describe the dynamic change in CO2 levels in the ICU using an Internet of Things-based (IoT-based) monitoring system. Specifically, given that the COVID-19 pandemic makes hospital visitation restrictions necessary worldwide, this study aimed to appraise the impact of visitation restrictions on CO2 levels in the ICU.MethodsSince February 2020, an IoT-based intelligent indoor environment monitoring system has been implemented in a 24-bed university hospital ICU, which is symmetrically divided into areas A and B. One sensor was placed at the workstation of each area for continuous monitoring. The data of CO2 and other pollutants (e.g., PM2.5) measured under standard and restricted visitation policies during the COVID-19 pandemic were retrieved for analysis. Additionally, the CO2 levels were compared between workdays and non-working days and between areas A and B.ResultsThe median CO2 level (interquartile range [IQR]) was 616 (524–682) ppm, and only 979 (0.34%) data points obtained in area A during standard visitation were ≥ 1,000 ppm. The CO2 concentrations were significantly lower during restricted visitation (median [IQR]: 576 [556–596] ppm) than during standard visitation (628 [602–663] ppm; p < 0.001). The PM2.5 concentrations were significantly lower during restricted visitation (median [IQR]: 1 [0–1] μg/m3) than during standard visitation (2 [1–3] μg/m3; p < 0.001). The daily CO2 and PM2.5 levels were relatively low at night and elevated as the occupant number increased during clinical handover and visitation. The CO2 concentrations were significantly higher in area A (median [IQR]: 681 [653–712] ppm) than in area B (524 [504–547] ppm; p < 0.001). The CO2 concentrations were significantly lower on non-working days (median [IQR]: 606 [587–671] ppm) than on workdays (583 [573–600] ppm; p < 0.001).ConclusionOur study suggests that visitation restrictions during the COVID-19 pandemic may affect CO2 levels in the ICU. Implantation of the IoT-based IAQ sensing network system may facilitate the monitoring of indoor CO2 levels
- …